智能论文笔记

Clustering and Analysis of GPS Trajectory Data using Distance-based Features

Zann Koh , Yuren Zhou , Billy Pik Lik Lau , Ran Liu , Keng Hua Chong , Chau Yuen

分类：机器学习

2022-12-01

The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.

translated by 谷歌翻译

GreenPLM: Cross-lingual pre-trained language models conversion with (almost) no cost

Qingcheng Zeng , Lucas Garay , Peilin Zhou , Dading Chong , Yining Hua , Jiageng Wu , Yikang Pan , Han Zhou , Jie Yang

分类：自然语言处理

2022-11-13

While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. In addition, when given a low computational cost (2.5\%), the framework outperforms the original monolingual language models in six out of seven tested languages. We release language models in 50 languages translated from English and the source code here.

translated by 谷歌翻译

METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets

Peilin Zhou , Zeqiang Wang , Dading Chong , Zhijiang Guo , Yining Hua , Zichang Su , Zhiyang Teng , Jiageng Wu , Jie Yang

分类：自然语言处理

2022-09-28

Covid-19-Pandemic继续在社交媒体上提出各种讨论或辩论的主题。为了探索大流行对人们生活的影响，了解公众对与大流行有关的实体（例如药物，疫苗）对社交媒体的关注和态度至关重要。但是，对现有命名实体识别（NER）或目标情感分析（TSA）数据集培训的模型具有有限的理解与COVID相关的社交媒体文本的能力有限，因为这些数据集并未从医学角度设计或注释。本文释放了Mets-COV，这是一种包含医疗实体的数据集和与COVID相关的推文中的目标情感。 Mets-COV包含10,000条带有7种实体的推文，包括4种医疗实体类型（疾病，药物，症状和疫苗）和3种通用实体类型（人，位置和组织）。为了进一步调查推文用户对特定实体的态度，选择了4种类型的实体（人，组织，药物和疫苗），并用用户情感注释，从而产生了具有9,101个实体（5,278个推文）的目标情感数据集。据我们所知，METS-COV是第一个收集与COVID相关推文的医疗实体和相应情感的数据集。我们通过广泛的实验对经典机器学习模型和最先进的深度学习模型进行基准测试。结果表明，该数据集在NER和TSA任务方面都有大量改进的空间。 METS-COV是开发更好的医学社交媒体工具并促进计算社会科学研究的重要资源，尤其是在流行病学方面。我们的数据，注释准则，基准模型和源代码公开可用（https://github.com/ylab-open/mets-cov），以确保可重复性。

translated by 谷歌翻译

PAN: Pulse Ansatz on NISQ Machines

Zhiding Liang , Jinglei Cheng , Hang Ren , Hanrui Wang , Fei Hua , Yongshan Ding , Fred Chong , Song Han , Yiyu Shi , Xuehai Qian

分类：机器学习

2022-08-02

变异量子算法（VQA）在NISQ时代表现出巨大的潜力。在VQA的工作流程中，Ansatz的参数迭代更新以近似所需的量子状态。我们已经看到了各种努力，以较少的大门起草更好的安萨兹。在量子计算机中，栅极Ansatz最终将转换为控制信号，例如TransMons上的微波脉冲。并且对照脉冲需要精心校准，以最大程度地减少误差（例如过度旋转和旋转）。在VQA的情况下，此过程将引入冗余，但是VQAS的变异性能自然可以通过更新幅度和频率参数来处理过度旋转和重组的问题。因此，我们提出了PAN，这是一种用于VQA的天然脉冲ANSATZ GENTARATOR框架。我们生成具有可训练参数用于振幅和频率的天然脉冲ansatz。在我们提出的锅中，我们正在调整参数脉冲，这些脉冲在NISQ计算机上得到了内在支持。考虑到本机 - 脉冲ANSATZ不符合参数迁移规则，我们需要部署非级别优化器。为了限制发送到优化器的参数数量，我们采用了一种生成本机 - 脉冲ANSATZ的渐进式方式。实验是在模拟器和量子设备上进行的，以验证我们的方法。当在NISQ机器上采用时，PAN获得的延迟平均提高了86％。 PAN在H2和HEH+上的VQE任务分别能够达到99.336％和96.482％的精度，即使NISQ机器中有很大的噪声。

translated by 谷歌翻译

Asymmetric Co-teaching with Multi-view Consensus for Noisy Label Learning

Fengbei Liu , Yuanhong Chen , Chong Wang , Yu Tain , Gustavo Carneiro

分类：计算机视觉

2023-01-01

Learning with noisy-labels has become an important research topic in computer vision where state-of-the-art (SOTA) methods explore: 1) prediction disagreement with co-teaching strategy that updates two models when they disagree on the prediction of training samples; and 2) sample selection to divide the training set into clean and noisy sets based on small training loss. However, the quick convergence of co-teaching models to select the same clean subsets combined with relatively fast overfitting of noisy labels may induce the wrong selection of noisy label samples as clean, leading to an inevitable confirmation bias that damages accuracy. In this paper, we introduce our noisy-label learning approach, called Asymmetric Co-teaching (AsyCo), which introduces novel prediction disagreement that produces more consistent divergent results of the co-teaching models, and a new sample selection approach that does not require small-loss assumption to enable a better robustness to confirmation bias than previous methods. More specifically, the new prediction disagreement is achieved with the use of different training strategies, where one model is trained with multi-class learning and the other with multi-label learning. Also, the new sample selection is based on multi-view consensus, which uses the label views from training labels and model predictions to divide the training set into clean and noisy for training the multi-class model and to re-label the training samples with multiple top-ranked labels for training the multi-label model. Extensive experiments on synthetic and real-world noisy-label datasets show that AsyCo improves over current SOTA methods.

translated by 谷歌翻译

A Global Optimization Algorithm for K-Center Clustering of One Billion Samples

Jiayang Ren , Ningning You , Kaixun Hua , Chaojie Ji , Yankai Cao

分类：机器学习

2022-12-30

This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.

translated by 谷歌翻译

From Single-Visit to Multi-Visit Image-Based Models: Single-Visit Models are Enough to Predict Obstructive Hydronephrosis

Stanley Bryan Z. Hua , Mandy Rickard , John Weaver , Alice Xiang , Daniel Alvarez , Kyla N. Velear , Kunj Sheth , Gregory E. Tasian , Armando J. Lorenzo , Anna Goldenberg

分类：计算机视觉 | 人工智能

2022-12-27

Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.

translated by 谷歌翻译

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Neural Networks

Sitao Luan , Mingde Zhao , Chenqing Hua , Xiao-Wen Chang , Doina Precup

分类：机器学习 | 人工智能

2022-12-21

The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood information of nodes. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN models for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks.

translated by 谷歌翻译

Dynamic Speed Guidance for CAV Ramp Merging in Non-Cooperative Environment: An On-Site Experiment

Wei Ji , Yechi Ma , Guangzhang Cui , Xiaotian Qin , Wei Hua

分类：机器人

2022-12-21

Ramp merging is a typical application of cooperative intelligent transportation system (C-ITS). Vehicle trajectories perceived by roadside sensors are importation complement to the limited visual field of on-board perception. Vehicle tracking and trajectory denoising algorithm is proposed in this paper to take full advantage of roadside cameras for vehicle trajectory and speed profile estimation. Dynamic speed guidance algorithm is proposed to help on-ramp vehicles to merge into mainline smoothly, even in non-cooperative environment where mainline vehicles are not expected to slow down to accommodate on-ramp vehicles. On-site experiments were taken out in a merging area of Hangzhou Belt Highway to testify our prototype system, and simulation analysis shows our proposed algorithm can achieve significant fuel savings during the ramp merging process.

translated by 谷歌翻译

Query Enhanced Knowledge-Intensive Conversation via Unsupervised Joint Modeling

Mingzhu Cai , Siqi Bao , Xin Tian , Huang He , Fan Wang , Hua Wu

分类：自然语言处理

2022-12-19

The quality of knowledge retrieval is crucial in knowledge-intensive conversations. Two common strategies to improve the retrieval quality are finetuning the retriever or generating a self-contained query, while they encounter heavy burdens on expensive computation and elaborate annotations. In this paper, we propose an unsupervised query enhanced approach for knowledge-intensive conversations, namely QKConv. There are three modules in QKConv: a query generator, an off-the-shelf knowledge selector, and a response generator. Without extra supervision, the end-to-end joint training of QKConv explores multiple candidate queries and utilizes corresponding selected knowledge to yield the target response. To evaluate the effectiveness of the proposed method, we conducted comprehensive experiments on conversational question-answering, task-oriented dialogue, and knowledge-grounded conversation. Experimental results demonstrate that QKConv achieves state-of-the-art performance compared to unsupervised methods and competitive performance compared to supervised methods.

translated by 谷歌翻译